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Abstract 

The  application  of  multiscale  and  stochastic  techniques  to  the  solution  of  linear  inverse 
problems  is  presented.  This  approach  allows  for  the  explicit  and  easy  handling  of  a  variety 
of  difficulties  commonly  associated  with  problems  of  this  type.  Regularization  is  accomplished 
via  the  incorporation  of  prior  information  in  the  form  of  a  multiscaie  stochastic  model.  We 
introduce  the  relative  error  covariance  matrix  (RECM)  as  a  tool  for  quantitatively  evaluating 
the  manner  in  which  data  contributes  to  the  structure  of  a  reconstruction.  In  particular,  the 
use  of  a  scale  space  formulation  is  ideally  suited  to  the  fusion  of  data  from  several  sensors 
with  differing  resolutions  and  spatial  coverage  (eg.  sparse  or  limited  availability).  Moreover, 
the  RECM  both  provides  us  with  an  ideal  tool  for  understanding  and  analyzing  the  process  of 
muitisensor  fusion  and  allows  us  to  define  the  space-varying  optimal  scale  for  reconstruction  as 
a  function  of  the  nature  (resolution,  quality,  and  coverage)  of  the  available  data.  Examples  of 
our  multiscale  maximum  a  posteriori  inversion  algorithm  are  demonstrated  using  a  two  channel 
deconvolution  problem  formulated  so  as  to  illustrate  many  of  the  features  associated  with  more 
general  linear  inverse  problems. 


’This  work  was  supported  in  part  by  the  Office  of  Naval  Research  under  Grant  N00014-91-J-1004  and  the  Air 
Force  Office  of  Scientific  Research  under  Grant  AFOSR-92-J-0002. 

^The  work  of  this  author  was  also  supported  in  part  by  a  US  Air  Force  Laboratory  Graduate  Fellowship  and 
summer  work  performed  at  Schlumberger-Doll  Research 
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1  Introduction 

The  objective  of  a  linear  inverse  problem  is  the  recovery  of  an  underlying  quantity  given  a  col¬ 
lection  of  noisy,  linear  fimctionals  of  this  unknown.  These  problems  arise  in  fields  as  diverse  as 
geophysical  prospecting  [6,7,26,28,29,57,61],  medical  imaging  [5,33,36,37,52],  image  process¬ 
ing  [41],  groimdwater  hydrology  [8-10,46,47],  and  global  ocean  modeling  [2,44,60].  For  example,  a 
common  signal  and  image  processing  problem  is  that  of  deconvolution  where  one  observes  a  blurred 
version  of  the  signal  in  additive  noise  and  seeks  to  recover  the  xmcorrupted  original  [24,40,45,51]. 
Alternatively,  the  use  of  computer  aided  tomography,  magnetic  resonance  imaging,  and  related 
techniques  for  medical  diagnoses  has  lead  to  increased  efforts  in  the  development  of  algorithms  for 
the  inversion  of  the  Radon  transform  [36,37].  Finally,  the  exploration  of  oil  is  often  facilitated  by 
knowledge  of  the  electrical  conductivity  structure  of  a  rock  formation  [17].  The  conductivity  itself 
is  ascertained  by  estabhshing  a  magnetic  field  in  the  rock  formation  and  measuring  the  induced 
currents.  Although  this  inverse  problem  is  not  itself  linear,  a  common  approach  for  determining 
the  conductivity  requires  the  solution  of  a  sequence  of  linear  inverse  problems  [26,27,55,56]. 

While  it  is  not  difficult  to  find  practiced  instances  of  linear  inverse  problems,  it  is  often  quite 
challenging  to  generate  their  solutions  .  In  many  instances,  regularization  is  required  to  overcome 
problems  associated  with  the  poor  conditioning  of  the  linear  system  relating  the  observations  to 
the  underlying  function  [22,25,39].  This  iU- conditioning  may  be  caused  by  the  spatial  distribution 
of  data  to  be  used  in  generating  a  reconstruction  or  by  properties  inherent  in  the  linear  operator 
acting  on  the  unknown  quantity.  In  either  case,  regiilarization  serves  to  alleviate  the  iU-posedness 
of  the  original  problem  so  that  a  unique,  stable  solution  may  be  found.  Even  if  the  problem  is  not 
ill-conditioned,  a  regulaurizer  may  be  incorporated  as  a  means  of  constrzdning  the  reconstruction  to 
reflect  prior  knowledge  concerning  the  behavior  of  this  function  [41].  For  example,  it  is  corrunon 
practice  to  regularize  a  problem  so  as  to  enforce  a  degree  of  smoothness  in  the  reconstruction 
[25,31,41].  Also,  in  disciplines  such  as  geology,  the  phenomena  under  investigation  are  fractal  in 
nature  in  which  case  a  prior  model  with  a  l//-type  power  spectriim  is  used  as  a  regularizer. 

In  addition  to  the  regularization  issue,  characteristics  of  the  data  set  available  to  the  inversion 
algorithm  can  create  difficulties.  In  many  inverse  problems,  a  large  quantity  of  data  from  a  suite 
of  sensors  is  available  for  the  inversion;  however,  the  information  conveyed  by  each  measurement 
process  may  be  far  from  complete  so  that  one  is  confronted  with  the  problem  of  fusing  data  from 
several  sensors  to  achieve  the  desired  level  of  performance  in  the  inversion.  Hence,  there  is  a  need 
for  understanding  precisely  how  data  contributes  information  to  a  reconstruction  and  the  manner 
in  which  measurements  from  different  sources  are  merged  by  the  inversion  routine.  Alternatively, 
the  availability  of  the  data  often  is  limited.  For  example,  one  may  be  constrained  to  collecting 
measurements  on  the  boundary  of  a  region  while  the  quantity  of  interest  is  to  be  estimated  over 
the  interior  as  is  the  case  in  [5,6,11,61].  Here,  one  requires  flexible  inversion  algorithms  capable 
of  processing  data  possessing  spaxse  or  limited  spatial  distributions.  Additionally,  one  must  com¬ 
pensate  for  errors  present  in  the  data  which  may  arise  from  noise  in  the  measurement  apparatus, 
unknown  quantities  associated  with  the  experimental  conditions,  modeling  errors  induced  by  the 
simplification  of  physics  and  the  presence  of  nuisaince  parameters  in  the  model.  Finally,  one  must  be 
concerned  with  the  computational  complexity  of  the  inversion  algorithm.  Typically,  the  inversion 
requires  the  solution  of  a  large  system  of  linear  equations  so  that  advantage  must  be  taken  of  any 
structure  or  sparseness  present  in  the  matrices  associated  with  the  problem. 

In  this  paper  we  develop  a  framework  for  inversion  based  upon  a  multiscale  description  of  the 
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data,  the  operators,  and  the  function  to  be  reconstructed.  The  seminal  work  on  linear  operators 
and  wavelet  decompositions  is  that  of  Beylkin,  Coifman  and  Rohkhn  [4].  Their  results  on  the 
compression  of  whole  classes  of  linear  operators  in  a  nonstcmdard  wavelet  representation  is  mathe¬ 
matically  deep  and  has  many  practical  consequences  for  the  solution  of  the  forward  problem.  In  [1], 
Alpert  et.  al  formulate  a  discrete  multiresolution  analysis  which  also  performs  well  in  terms  of 
operator  compression.  Moreover,  they  develop  and  analyze  a  computationally  efficient  method  for 
constructing  and  applying  the  inverse  of  their  operator.  As  stated  however,  their  algorithm  does 
not  accoimt  for  effects  such  as  observation  noise.  Furthermore,  Alpert’s  method  does  not  allow  for 
the  incorporation  of  prior  knowledge  into  the  inversion  scheme  or  for  the  processing  of  irregularly 
spaced  data. 

More  recently,  in  [58]  Wang  ei  al.  develop  a  multiscale  deconvolution  scheme  and  apply  it  to 
both  one  and  two  dimensional  problems.  The  algorithm  in  [58]  employs  a  wavelet  representation 
of  the  data,  the  operator,  the  noise,  and  the  prior  model.  These  authors  focus  their  attention  on 
the  recovery  of  a  signal  from  a  single,  noise  corrupted,  blurred  version  of  the  original  and  in  using 
their  multiresolution  representations  for  the  purpose  of  edge  detection.  The  issue  of  multi-sensor 
data  fusion  is  not  explored  by  Wang  et  al.  Nor  are  these  authors  concerned  with  processing  sparse 
or  irregularly  sampled  data  sets.  Finally,  no  explicit  attempt  is  made  in  [58]  to  understand  and 
quamtify  the  manner  in  which  the  data  supports  but  a  limited  level  of  detsdl  in  the  reconstruction. 

The  inversion  algorithm  used  here  is  drawn  from  the  theory  of  statistical  estimation.  Such  an 
approach  allows  for  the  exphcit  modeling  of  the  errors  in  the  data  as  sample  paths  from  random 
processes.  All  prior  information  regarding  the  structure  of  the  underlying  function  is  summarized 
in  the  form  of  a  statistical  model  which  also  acts  as  a  regularizer.  Moreover,  these  techniques 
compute  not  only  the  estimate  of  the  function  of  interest,  but  also  provide  a  built-in  performance 
indicator  in  the  form  of  an  error  covciriance  matrix.  This  matrix  is  central  to  an  imdersteinding  of 
the  manner  in  which  information  from  a  set  of  observations  is  propagated  into  a  reconstruction. 

We  utilize  a  1//  fractad  prior  model  specified  in  the  wavelet  transform  domain  for  the  purposes 
of  regularization.  While  clearly  not  the  only  multiscale  model  available  for  this  purpose,  the  1// 
model  is  useful  for  a  number  of  reasons.  First,  as  noted  in  [41],  this  model  produces  the  same  effects 
as  the  more  traditional  smoothness  regulairizers.  Hence,  its  behavior  and  utility  are  well  understood. 
Second,  the  use  of  a  1//  model  utilizes  data  at  different  scales  in  an  intuitively  pleasing  manner. 
Finally,  l//-type  processes  asstime  a  psirticularly  simple  form,  easily  implemented  in  the  wavelet 
transform  domain. 

The  inversion  algorithms  developed  in  this  paper  Eire  unique  in  their  ability  to  overcome  many  of 
the  data-oriented  difficulties  associated  with  spatial  inverse  problems.  Specifically,  our  techniques 
are  designed  for  the  processing  of  information  from  a  suite  of  sensors  where  the  sampling  structure  of 
each  observation  process  may  be  spcirse  or  incomplete.  In  the  case  of  standard  time-series  analysis, 
there  exist  well  established  methods  for  merging  data  from  a  variety  of  sources  (eg.  the  Kalman 
and  multichEinnel  Wiener  filters);  however,  generalizations  of  these  ideas  for  processing  spatial  data 
with  irregular  sampling  patterns  have  been  elusive.  For  example,  traditional  Fourier  techniques 
typically  require  the  use  of  some  type  of  space-domain  windowing  or  interpolation  methods  which 
tend  to  cause  distortion  in  the  frequency  domain.  By  using  the  multiscale  approach  developed 
here,  such  preprocessing  is  unnecessEiry  thereby  avoiding  both  the  cost  of  the  operation  and  the 
distortion  in  the  transform  domain. 

Given  this  ability  to  merge  data  from  a  VEiriety  of  sources,  we  develop  a  quantitative  theory  of 
sensor  fusion  by  which  we  are  able  to  imderstand  how  information  from  a  suite  of  observations  is 
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merged  to  form  the  reconstruction.  It  is  often  the  case  that  one  wishes  to  extract  from  a  data  set  far 
more  information  about  the  underlying  function  than  is  supported  by  the  data.  The  insight  provided 
by  our  analysis  can  be  used  to  control  such  signal  processing  greed  by  defining  the  optimal  scale  of 
reconstruction  as  a  function  of  (1)  the  physics  relating  the  unknown  quantity  to  the  measurements 
and  (2)  the  spatial  coverage  and  measurement  quality  of  the  data  each  observation  source  provides. 
In  general,  such  an  approach  leads  to  a  space- varying  optimed  scale  of  reconstruction  which  allows 
for  the  recovery  of  line  scale  detail  only  where  the  data  supports  it.  At  other  spatial  locations,  a 
coarser  approximation  to  the  function  is  generated.  In  the  multisensor  case,  not  only  can  a  space- 
varying  optimal  scale  of  reconstruction  be  defined,  but  at  any  point  in  space  and  scale,  only  data 
from  those  sources  contributing  significant  information  need  be  processed.  Thus,  the  computational 
bxirden  associated  with  performing  the  inversion  can  be  reduced.  Also,  our  techniques  are  useful 
for  capturing  the  incremental  benefits  associated  with  the  addition  of  information  from  a  set  of 
observations  to  a  reconstruction  based  upon  data  from  a  different  group  of  sensors.  Finally,  we 
note  that  our  use  of  a  multiscale  representation  of  the  operators  defining  the  inverse  problem  leads 
to  sparse  linear  systems  in  the  transform  domain.  Hence,  the  work  of  Beylkin  et  al  [4]  suggests 
that  highly  efficient  techniques  are  available  for  obtaining  the  estimate  given  a  set  of  data. 

The  remciinder  of  this  paper  is  organized  as  follows.  In  Section  2  we  formulate  the  multisensor 
linear  inverse  problem  and  discuss  its  transformation  to  scale  space.  Section  3  is  devoted  to  a 
presentation  of  the  estimation-theoretic  techniques  to  be  used  for  performing  the  inversion  and 
analyzing  sensor  fusion.  A  set  of  excimples  highlighting  the  contributions  of  this  work  are  presented 
in  Section  4.  Finally,  directions  for  future  work  and  conclusions  Eire  given  in  Section  5. 

2  Problem  Formulation 

2.1  The  Observations  Processes 

In  this  work,  it  is  assumed  that  the  the  data  upon  which  the  inversion  is  to  be  based,  yi{x),  is  related 
to  the  function  to  be  reconstructed,  g{x),  via  a  system  of  linear  integral  equations  embedded  in 
additive  noise.  Hence  the  observation  model  to  be  considered  is 

yi{x)  =  j  Ti{x,x')g{x')dx' +  ni{x)  i=l,2,...K.  (1) 

where  the  integral  kernels,  Ti{x, «'),  and  the  characteristics  of  the  noise  processes  ni{x)  are  known. 
The  variable  x  could  represent  one,  two,  or  three  spatial  dimensions.  As  a  first  step  in  understanding 
the  advantages  and  utility  of  a  multisccde,  stochastic  approach  to  the  solution  of  systems  of  equation 
of  the  form  given  in  (1),  oidy  ID  problems  are  to  be  considered  here. 

The  noiseless  version  of  (1)  is  known  as  a  first  kind  integral  equation  of  either  the  Fredholm  or 
Volterra  variety  depending  upon  the  limits  of  integration.  This  type  of  structure  arises  frequently 
when  considering  physical  systems  described  by  ordinary  or  partial  differential  equations  [23,50]. 
Additionally,  such  relationships  may  be  enco\mtered  as  a  result  of  linearization  of  a  second  kind 
integral  equation  [32,34,54].  When  Ti{x,  x')  =  Ti{x  -  x'),  the  problem  of  finding  g  based  upon  yi  is 
known  as  a  deconvolution  problem  and  is  encountered  widely  in  practice  [24,35,40,45].  Thus,  the 
mathematical  structure  to  be  considered  in  this  paper  is  quite  general  and  may  be  used  to  describe 
a  wide  variety  of  practical  problems. 

A  key  feature  of  the  linear  integral  equation  modeling  structure  is  its  flexibility.  By  specifying 
the  structure  of  the  kernels,  multisensor  fusion  problems  can  be  described  wherein  the  data  from 
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Figure  1:  Convolutional  Kernel  Functions 


individual  sources  conveys  information  about  g  at  a  variety  of  spatial  scales.  For  example,  in 
Section  4,  a  two  channel  problem  is  considered.  The  kernel  functions  in  this  case  satisfy  Ti(x,  x')  — 
Ti{x  —  x')  =  Ti{$)  for^  i  G  {/,  c}  and  are  plotted  in  Figure  1.  The  kernel  labeled  Tf  gives  essentially 
pointwise  observations  thereby  supplying  fine  scale  data  for  the  inversion.  Alternatively,  Tc  performs 
a  local  averaging  of  the  function  g  so  that  provides  coarse  scale  information  regarding  the 
structure  of  g. 

The  maimer  in  which  information  from  each  of  these  data  sources  is  used  in  an  inversion  is 
affected  by  both  its  quahty  and  quantity.  The  quality  of  the  data  is  determined  by  the  level  of 
noise,  n^,  present  in  the  signal  (1),  where  the  eire  taken  to  be  zero  mean  white  Gaussian  noise 
sources  with  intensities  r^.  Generally,  the  larger  the  noise  intensity,  the  less  reliable  will  be  the  data. 
The  quantity  of  data  refers  to  the  number  and  distribution  of  samples  available  to  an  algorithm.  In 
practice,  a  data  set  is  composed  of  a  finite  number  of  samples,  yi{xj)  j  =  0,  1,  . . .  Ni  contained  in 
some  finite  interval  of  the  real  line  where  we  will  denote  by  y,  the  Ni  dimensional  vector  composed 
of  all  of  the  data  samples  from  the  observation  process.  Clearly,  altering  the  number  or  location 
of  the  Xj  changes  the  nature  of  the  information  conveyed  by  the  data  thereby  impacting  the  way  in 
which  a  psirticular  observation  process  contributes  to  a  reconstruction.  In  Section  4,  we  illustrate 
several  variations  of  data  quality  and  spatial  distribution  for  the  two  chtumel  problem  mentioned 
previously  which  are  illustrative  of  physically  meciningful  measurement  configurations  and  which 
allow  us  to  demonstrate  the  capabilities  of  our  formalism  both  in  exposing  the  resolution  tradeoffs 
in  multisensor  fusion  and  in  deahng  with  nonuniform  sampling  patterns  to  which  standard  Fourier- 
based  deconvolution  methods  are  inapplicable. 

2.2  A  Wavelet  Representation  of  g[x) 

A  multiscale  representation  of  g(x)  is  obtained  via  the  use  a  wavelet  expansion.  We  begin  with  two 
assumptions.  First,  g  is  taken  to  be  “scale-hmited”  so  that  there  exist  both  a  finest  scale  for  the 
reconstruction.  Mg,  beyond  which  additional  detail  is  either  not  present  or  cannot  be  resolved  given 
the  data  and  a  coarsest  scale,  Lg,  of  interest.  Second,  we  assume  that  g{x)  is  only  to  be  recovered 
for  I  in  a  closed  and  bounded  interval  of  the  real  line.  Then,  with  (p{x)  and  representing, 

^Note  that  throughout  this  paper  the  subscript  /  is  used  to  denote  quantities  associated  with  the  fine  scale 
observation  process  while  the  subscript  c  is  used  for  the  coarse  scale  measurements. 
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respectively,  the  scaling  and  wavelet  functions  for  a  compactly  supported  orthonormal  wavelet 
decomposition  [15],  we  can  represent  g{x)  in  terms  of  its  approximations  at  any  scale  Lg  <  m  <  Mg 
and  the  detail  at  successively  finger  scales  m  <  k  <  Mg  -  1 

V,(m)  Mg-lN,(k) 

3i^)=  Z]  ^  7(m,n)V>m,n(a:)  (2) 

n=0  k=:m  n=rO 

where  tprn.ni^)  appropriately  scaled  and  shifted  version  of  'tp{x)  and  ifi{x)  (i.e. 

^m,n(®)  =  -  n))  and  where  A’j(m)  denotes  the  finite  number  of  terms  in  the  expansion 

at  the  scale. 

Note  that  if  m  =  Mg,  the  double  summation  disappears,  and  we  have  a  representation  for 
g{x)  in  terms  of  its  finest  scale  scaling  coefficients  g{Mg,n).  At  the  other  extreme,  we  have  that 
with  m  =  Lg,  (2)  represents  g{x)  in  terms  of  its  coeirsest  scale  scaling  coefficients,  g{Lg,n),  and 
its  wavelet  coefficients,  ~f{k,n),  at  all  scades  of  interest,  Lg  <  k  <  Mg  —  1.  Furthermore,  we  also 
have  the  scale-recursive  relationship  for  the  scaling  coefficients  g{m,  n)  that  arises  directly  from  the 
dilation  equations  [15]  for  i^(z)  and  ip{x) 

¥’(*)  =  -  n) 

n 

’/’(*)  =  -  n) 

n 

where  h{n)  and  g{n)  are  the  finite  length  sequences  associated  with  this  wavelet  basis.  If  we  now 
collect  all  coefficients  at  individual  scales  into  vectors,  i.e.  we  define  g{m)  (resp.  'fim))  to  be  the 
vector  of  scaling  (resp.  wavelet)  coefficients  of  the  function  g{x)  at  scale  m,  we  have  the  discrete 
wavelet  transform  (DWT),  as  described  in  [4],  relating  g(Tn  -f-  1)  to  g{m)  and  7(m): 

g{m)  =  H{Tn)g{m+ 1)  (3) 

■y{Tn)  -  G{m)g{m+ 1)  (4) 

+  1)  =  +  G^(m)7(m)  (5) 

where  H{m)  and  G{m)  are  matrices  formed  from  the  low-  and  high-pass  filtering  coefficients  h{n) 
and  g{n),  respectively.  Also,  since  g{x)  is  considered  only  over  a  compact  interval,  we  need  to  deal 
with  the  edge  effects  in  the  wavelet  transform  at  the  ends  of  the  interval.  While  there  are  a  variety 
of  ways  in  which  to  do  this,  such  as  modifying  the  wavelet  and  scaling  fimctions  at  the  ends  of 
the  interval  in  order  to  provide  an  orthogonal  decomposition  over  the  interval  [16],  we  have  chosen 
here  to  use  one  of  the  most  commonly  used  methods  [4],  namely  that  of  cychcaUy  wrapping  the 
interval  which  induces  a  circulzint  structure  in  H{m)  and  G{m).  While  this  does  introduce  some 
edge  effects,  these  are  of  negligible  importance  for  the  objectives  and  issues  we  wish  to  emphasize 
and  explore  and  for  the  applications  considered  here  (or  in  general  if  the  support  of  the  scaling  and 
wavelet  functions  at  the  coarsest  scale,  Lg,  of  interest  is  small  compared  to  the  overall  length  of  the 
interval.)  Further,  the  methods  we  describe  can  be  readily  adapted  to  other  approaches  for  dealing 
with  edge  effect  as  in  [16]  and  the  references  contained  therein. 

Equations  (3)  and  (4)  suggest  that  we  may  construct  a  matrix^  Wg  from  H'(m)  and  G(m)  which 
relates  the  finest  scale  scaling  coefficients,  g  =  g{Mg),  to  the  coarsest  scaling  coefficients,  g{Lg), 

^We  choose  to  subscript  the  wavelet  transform  operator  here  as  Wj  to  make  explicit  that  this  is  the  transform  for 
g{s).  We  may  (and  in  fact  ■mill)  use  different  wavelet  transforms  for  the  various  data  sets,  yi 
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Figure  2:  A  sample  lattice  structure  corresponding  to  a  D4  wavelet  transform.  The  finest  scale  is 
taken  as  Mg  while  the  coarsest  is  Lg.  Dotted  line  connections  link  nodes  from  one  side  of  the  lattice 
to  the  other  and  arise  from  the  particular  implementation  of  the  transform  used  here. 

and  aU  intervening  detail  coefficients  7(m)  for  m  =  Lg,  Lg  +  1,  . . . ,  Mg  -  1.  That  is,  we  may  write 

7  =  Wjff  (6) 

where  7  =  ['y{Mg  —  1)^  .  .  .  ^{LgY  g{Lg)'^Y'  ^3  satisfies  WjWj  =  I.  We  refer  to  the  vector 

7  as  the  wavelet  transform  of  the  function  g{x). 

Given  this  implementation  of  the  DWT,  the  relationships  among  the  scale  space  component  in 
the  decomposition  of  g  are  graphically  represented  in  the  form  of  a  lattice  as  shown  in  Figtire  2  for 
the  case  of  a  wavelet  decomposition  with  h{n)  and  g{n)  of  length  4  (such  as  the  so-called  “D4”  or 
Daubechies  4-tap  wavelet  decomposition  described  in  [15].)  At  the  finest  scale,  the  nodes  represent 
the  finest  set  of  scaling  coefficients.  Each  node  at  all  other  scales  contains  one  wavelet  and  one 
scaling  coefficient.  Two  nodes  axe  connected  by  an  arc  if  and  only  if  there  is  a  linear  relationship 
between  the  contents  of  these  nodes  as  dictated  by  the  structure  of  the  wavelet  transform  matrix 
Wj.  An  ordering  is  assumed  for  the  nodes  of  the  lattice  starting  at  the  lower  left  corner  of  the  finest 
scale,  proceeding  to  the  right  and  then  continuing  with  the  leftmost  node  at  the  next  coarsest  scale 
etc. 

A  cocirse  scale  node  is  said  to  impact  a  finer  scale  if  there  exists  a  strictly  downward  path  on 
the  lattice  from  the  former  to  the  latter.  We  define  the  upwcird  impact  set  associated  with  the 
node  {Mg,i)  (i.e.  the  node  at  scale  Mg  and  shift  i)  as  the  set  of  all  nodes  which  impact  {Mg,i)  and 
denote  this  set  as  U{Mg,  i)  [U  for  “upward”.)  Thus  in  Figure  2  the  set  of  nodes  labeled  using  a 
correspond  to  U  for  the  node  given  by  a  “Q.”  Alternatively,  for  node  (m,  j)  which  is  not  located 
at  the  finest  scale,  'D{ni,j)  {V  for  “downward”)  is  taken  as  the  set  of  finest  scale  nodes  which  this 
node  ultimately  impacts.  Thus  in  Figure  2,  P(n)  is  comprised  of  all  nodes  marked  with  the  symbol 

a||» 


2.3  Transformation  of  the  Integral  Equation  to  Wavelet  Space 

Transformation  of  cin  integral  equation  of  the  form  considered  in  (1)  to  the  wavelet  transform 
domain  begins  with  its  discretization.  In  practice,  discretization  with  respect  to  x  is  performed 
a  priori  as  the  data  yi{x)  axe  avciilable  orJy  at  a  finite  set  of  points  as  discussed  in  Section  2.1. 
By  using  a  wavelet  expansion  of  ^(a:),  we  relate  the  samples  yi{xj)  to  the  finest  set  of  scaling 
coefficients  of  g{x).  Substituting  (2)  with  m  =  Mg  into  (1)  and  reversing  the  order  of  integration 
and  summation  yields  the  matrix- vector  relation: 


yi  =  Tig  +  Ui 


(7) 
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(a)  The  convolution  matrix,  T/  (b)  The  convolution  matrix,  Tc 


Figure  3:  Grayscale  plots  of  the  convolution  matrices  T/and  Tc-  Darker  coloring  indicated  larger 
magnitudes.  The  concentration  of  Tf  near  the  diagonal  implies  that  yf  =  Tjg  +  Uf  represents 
close  to  pointwise  observation  of  g  and  therefore  wiU  convey  “fine  scale”  information  regarding  the 
structure  of  g.  Alternatively,  T  essentially  conveys  “coarse  scale”  information  about  g  as  much  of 
the  fine  scale  variation  in  g  is  removed  under  the  averaging  action  of  this  operator. 


where  the  (a,/3)  element  of  the  matrix  T  is 

=  j  Ti{xc.,x')ipM„g{^)dx' . 

The  matrices  Tf  and  Tc  corresponding  to  the  two  convolutional  kernel  functions  of  Figure  1  are 
displayed  in  Figures  3(a)  and  3(b). 

Equation  (7)  relates  the  finest  scale  scaling  coefficients  of  g{x)  and  the  samples  of  the  noise 
processes  to  the  samples  of  the  observation  process  For  the  purposed  of  the  inversion,  we  desire 
a  relationship  between  the  wavelet  transform,  7,  of  y  £md  a  multiscale  representation  of  to  a 
multiscale  representation  of  the  data.  Toward  this  end,  we  must  define  a  discrete  wavelet  transform 
operator  that  transforms  the  vector  of  sampled  measurements,  pi,  into  its  wavelet  decomposition 

=  WiTWjj  +  W.n, 

=  ©<7  +  Vi  (8) 

where,  as  before,  consists  of  a  coarsest  scale  set  of  scaling  coefficients,  yi(Li),  at  scale  Li  and 
a  complete  set  of  finer  scale  wavelet  coefficients  T]i(m),  Li  <  m  <  Mi  —  where  Mi  is  the  finest 
scale  of  representation.  Note  that  we  can  think  of  this  transform  as  a  purely  discrete  one,  taking 
the  sequence  of  values  yi{xj)  ;  =  1,  2  , . . .  to  the  elements  of  T]i.  Alternatively,  since  the  original 
data  are  samples  of  (1),  we  can  think  of  the  raw  data  as  empirically  obtained  scaling  coefficients  at 
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Quantity 

Wavelet  Transform 

Wavelet  Coefficients 

Scaling  Coefficients 

Data  j/i 

Vi  = 

Viim) 

yi{m) 

Function  g{x) 

II 

j{m) 

g(m) 

Noise  rii 

t/i  =  WiUi 

Vi{m) 

ni(m) 

Table  1:  Notation  for  wavelet  and  scaling  coefficient  vectors 


some  finest  scale  Mj  in  a  wavelet  representation  of  the  functions  yi(a:)  and  nj(x).  In  [18],  Donoho 
provides  a  rigorous  discussion  of  the  relationship  between  the  theoretical  scaling  coefficients  defined 
in  terms  of  integrals  of  yi{x)  and  wavelet  functions  aind  the  samples;  however,  for  the  purposes  of 
the  work  in  this  paper,  such  distinctions  in  the  interpretation  of  (8)  are  of  secondciry  importance. 

In  Table  1,  we  have  summarized  the  notation  that  we  will  use.  For  example,  for  the  data  j/i,  the 
corresponding  wavelet  transform  rji  ~  WiHi  consists  of  wavelet  coefficients  T/i(m),  Li  <  m  <  M^  —  I, 
and  coarsest  scale  scaling  coefficients  yi{Li).  Also,  if  we  form  only  partial  wavelet  approximations 
from  scale  Li  through  scale  m,  the  corresponding  scaling  coefficients  (which  are  obtained  from 
yi{Li)  and  Tji{k),  Li  <  k  <  m  -  1)  au-e  denoted  by  yi{m).  We  adopt  the  analogous  notation  for  the 
function  g  and  the  noise  where  in  general  we  use  the  letters  [y,  g,  n)  for  the  original  data  and 
scaling  coefficients  and  their  Greek  counterparts  (17,  7,  v)  for  the  full  wavelet  transforms  and  the 
wavelet  coefficients. 

Finally,  it  is  often  useful  to  work  with  the  “stacked”  system  of  data  y  =  Tg  +  n  where  y  contains 
the  information  from  all  sensors  and  is  given  by 

y  =  [ylyl  ■■■ylV 

T  =  [Tf  Tj 
n  -  [n[  . .  .n^]^. 

In  the  transform  domain,  the  corresponding  equation  is 


T]  =  Qj  +  1/ 


(9) 


with  77,  0,  and  1/  are  defined  in  the  obvious  manner. 

3  Multiscale,  Statistical  Inversion  Algorithms 

3.1  A  Maximum  a  posteriori  Approach  to  Inversion 

A  traditional  technique  for  solving  linear  inverse  problems  of  the  form  y  =  Tg  +  n  is  to  choose  the 
estimate  of  g  according  to 


gtrad  =  argmin \\y  -  Tg\W_^  +  X\\Lg\\]  (10) 

3 

where  1|®||^  =  x'^Ax.  Equation  (10)  indicates  that  the  estimate  of  g  is  influenced  by  two  factors. 
The  first  term  enforces  fidelity  to  the  data  where  the  weighting  R~^  is  related  to  the  queuitity  of 
noise  in  the  data.  The  second  term  in  (10)  is  used  to  regularize  the  problem  in  the  event  that  T  is 
iU-conditioned.  Alternatively,  this  term  may  be  viewed  as  a  means  of  requiring  the  reconstruction 
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to  reflect  some  prior  knowledge  of  the  nature  of  In  either  case  the  regularization  or  the  prior 
knowledge  is  captured  in  the  structure  of  the  matrix  L.  Typically,  this  matrix  is  chosen  so  that 
some  degree  of  smoothness  is  present  in  gtrad  iii  which  case  L  is  taken  as  a  discrete  form  of  an 
appropriate  differential  operator  [3,41].  The  scalar  factor  A  is  used  to  determine  which  of  the  two 
terms  in  (10)  exerts  the  most  influence  in  the  reconstruction.  FinriUy,  the  optimization  problem 
given  by  (10)  admits  a  solution  which  defines  gtrad  in  terms  of  the  normal  equations 

{T'^R~^T  +  L^L)gtrad  =  T^R-^y-  (11) 

In  this  paper,  we  choose  to  approach  the  inverse  problem  from  a  statistical,  estimation-theoretic 
perspective.  That  is,  given  the  observations,  yi,  along  with  probabilistic  models  describing  the  noise 
processes  and  the  function  to  be  reconstructed,  the  problem  is  to  determine  a  statistically  optimal 
estimate  for  g.  Mathematically,  this  approach  leads  to  a  similar  set  of  normal  equations  as  those 
defined  in  (11)  so  that,  if  one  wishes,  the  reconstruction  of  g  generated  by  either  method  can  be 
made  the  same.  However,  the  combination  of  this  probabilistic  approach  and  the  use  of  a  multiscale 
framework  allows  for  much  more.  The  probabilistic  methods  generate  not  only  an  estimate  of  g, 
but  also  an  error  covariance  matrix,  P,  which  is  used  to  evaluate  the  accuracy  of  the  estimator 
in  reconstructing  g.  This  quantitative  performance  indicator  plays  a  key  role  in  developing  a 
rigorous  approach  to  the  understanding  of  the  ways  in  which  each  observation  process  contributes 
information  to  estimate  of  g  and  how  data  from  different  sources  are  fused. 

From  a  statistical  estimation  perspective,  the  normal  equations  are  obtained  by  defining  the 
reconstruction  as  the  Maximum  a  posteriori  (MAP)  estimate  of  g  under  the  condition  that  n  ~ 
A/^(0,  R)^  and  the  assumption  that  g  has  a  prior  probabilistic  distribution  A/’(0,  Pq).  In  this  work, 
each  Tii  comprising  the  vector  n  is  taken  to  be  a  zero-mean,  white  Gaussian  random  vector  with 
intensity  r^.  Now,  for  Pq  positive  definite,  the  MAP  estimate  is  defined  according  to  [41] 

gMAP  =  argminljy  -  -f  (12) 

9 

Thus,  Qmap  satisfies  normal  equations  of  the  form 

{T'^R-^T  +  Po~’'’'‘Po~"'")gMAP  =  T^R-^y.  (13) 

Finally,  defining  Po  and  R  to  be  the  wavelet  transforms  of  Pq  and  R  respectively  (i.e.  Pq  =  WjPoWj 
and  similarly  for  R)  allows  the  normeil  equations  to  be  written  in  the  wavelet  transform  domain  as 

(0^p-^0  -b  p;’^'^Pq^i^)^map  =  q'^rS-  (14) 


3.2  Multiscale  Prior  Models 

By  comparing  (11)  with  (13),  it  is  clear  that  the  choice  Pq  ~  {X^L'^L)~^  results  in  gnAP  =  gtrad- 
Recent  work,  however,  suggests  that  there  exist  a  wide  array  of  useful  prior  models  which  are 
specified  directly  in  scale  space  [13,41].  In  many  cases,  these  models  perform  essentially  the  same 
function  as  the  smoothness-based  regularizers;  however,  they  also  Ccurry  a  variety  of  additional 
benefits: 

^The  notation  x  ~  A^(m,  P)  indicates  that  the  random  vector  x  has  a  Gaussian  distribution  with  mean  m  and 
covariance  matrix  P. 
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•  They  are  exceptionally  easy  to  implement  [59]. 

•  They  lead  to  scale-space  algorithms  which  are  orders  of  magnitude  more  efficient  than  those 
estimation  schemes  operating  in  real-space  using  a  regularizer  based  upon  some  differential 
operator  [41]. 

•  They  are  fractal  in  nature  thereby  providing  realistic  models  for  a  variety  of  naturally  occur¬ 
ring  phenomena  [59] . 

To  motivate  the  pcirticular  choice  of  prior  model,  consider  taking  Pq  =  with  L 

representing  first  order  differentiation.  This  implies  that  ^  is  a  Brownian  motion  satisfying  Lg  =  w 
with  w  ~  A^(0,A~^jr).  As  discussed  in  [41],  work  by  WorneU  and  others  has  demonstrated  that 
Brownian  motions  and  other  related  fractal  processes  can  be  closely  approximated  via  a  Karhunen- 
Loeve  type  of  expansion  of  the  form  of  (2)  with  7(m,  n)  ~  A/’(0,  cr^2~'*”*)  and  independent.  Here, 
controls  the  overedl  magnitude  of  the  process  while  the  parameter  fj,  determines  the  fractal  structure 
of  sample  paths.  The  case  n  =  0,  corresponds  to  g{x)  being  white  noise  while  as  fi  increases,  the 
sample  paths  of  g  show  greater  long  range  correlation  and  smoothness. 

In  addition  to  defining  the  scale- varying  probabilistic  structure  of  the  wavelet  coefficients  in 
(2),  we  also  must  provide  a  statistical  model  for  the  coarsest  scale  scaling  coefficients,  g{Lg,n),  in 
(2).  Roughly  speaking,  these  coarse  scale  coefficients  describe  the  DC  and  low-frequency  structure 
of  g{x).  In  the  applications  we  consider  here,  we  assume  that  we  have  little  a  priori  knowledge 
concerning  the  long-term  average  vadue  of  g{x).  Consequently,  we  take  g{Lg,Tn)  ~  AA(0,p£,^)  where 

is  some  sufficiently  large  number.  By  choosing  in  this  manner,  we  avoid  any  bias  in  the 
estimator  of  the  low  frequency  structure  of  Sf(®). 

Obviously,  other  choices  of  statistics  for  7(7x1,71)  and  g[Lg,n)  may  be  appropriate  in  specific 
applications,  and  our  methodology  can  readily  accommodate  these.  The  specific  choice  we  have 
made,  leading  to  a  l//-like  fractal  model,  is  particularly  well  adapted  to  the  multiscale  formulation 
of  many  inverse  problems.  Coarse  scale  wavelet  coefficients  are  assumed  to  have  high  variances 
so  that  the  data  rather  than  prior  assumptions  influence  most  strongly  the  the  reconstruction  at 
these  scEdes.  Furthermore,  the  self-similar  scaling  law  in  the  variance  of  the  wavelet  coefficients 
is  well-adapted  to  many  physical  phenomena  that  display  fractEd-Uke  behavior.  In  addition,  the 
successively  decreasing  variEinces  of  the  fine  scale  wavelet  coefficients  control  the  incorporation  of 
high  frequency  information  into  the  reconstruction.  For  mEmy  problems  however,  this  represents 
am  eminently  reasonable  use  of  the  data.  As  will  be  seen  in  Section  4  for  deconvolution  problems, 
the  smoothing  action  of  the  convolutional  kernels  implies  that  the  data  supphes  primarily  coarse 
scale  information  regeurding  the  structure  of  g,  with  successively  decreasing  sensitivity  to  finer  scale 
Veiriations  in  g.  The  value  of  this  fine  scale  sensitivity,  of  course,  depends  not  only  on  the  sensitivity 
of  the  measurements  to  fine  scale  fluctuations  in  but  also  on  the  expected  size  of  fine  scale  detail 
in  relation  to  the  corresponding  scale  of  noise  fluctuations.  The  particular  choice  of  a  fractal  model 
provides  us  with  one  physicEilly  meaningful  way  in  which  to  specify  the  tradeoff  and  which  in  turn 
determines  the  way  in  which  the  resulting  estimation  Edgorithm  makes  effective  use  of  the  data  only 
over  those  scales  where  useful  information  is  present. 

To  summEurize,  a  fractal  prior  model  is  used  in  this  work  as  a  meEins  of  regularizing  the  linear 
inverse  problems.  Following  the  notation  introduced  in  Section  2,  the  model  is  defined  in  the 
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wavelet  transform  domain  as  7  ~  A^(0,  Pq)  where 

Po  =  block  diagiPoiM,  -  1),  ,  PoiL,),  Po{L,))  (15) 

Po{m)  =  (16) 

Poim)  =  Pl,In,(l,)  (17) 

with  In  am  n  X  n  identity  matrix.  Finally,  we  note  that  this  model  is  but  an  example  (albeit 
an  important  example)  of  a  rich  class  of  models  which  may  be  defined  in  scale  space.  Indeed, 
letting  both  a  as  well  as  /x  to  be  fimctions  of  scale  and/or  position  could  allow  for  the  modeling  of 
nonstationary  processes  possessing  space- varying  fractal  characteristics  such  as  multifractals  [19,30]. 
More  generally  in  [12, 13, 41,42],  the  authors  have  developed  a  set  of  multiscale  models  outside  of 
the  wavelet  formalism  defined  on  trees.  These  models  offer  a  compact  and  useful  characterization  of 
many  commonly  occurring  stochastic  process  and  are  well  suited  to  highly  efficient,  scale-recursive 
estimation  algorithms. 

3.3  The  Relative  Error  Covariance  Matrix 

A  key  advantage  of  the  use  of  statistical  estimation  techniques  is  the  ability  to  produce  not  only 
the  estimate  but  also  an  indication  as  to  the  quality  of  this  reconstruction.  Associated  with  the 
MAP  estimator  is  the  error  covariance  matrix,  P,  defined  in  the  transform  domain  as 

P  =  P[(7  -  7)^(7  -  7)] 


cind  which  under  the  Gaussian  models  defined  in  Section  3.1  taJces  the  form 

P  =  (0^p-'0  4- Po-')"^  (18) 

Taking  the  inverse  wavelet  transform  of  (18)  gives  the  error  covariance  matrix  P  associated  with 
estimating  g  from  data  Pi  and  a  prior  model  with  covariance  Pq 

P  =  E[{g-gf{g-g)] 

=  (r^p-^T  +  Po"^)-'  (19) 

The  diagonal  components  of  P,  the  error  variances,  are  commonly  used  to  judge  the  performance  of 
the  estimator.  Large  values  of  these  quantities  indicate  a  high  level  of  uncertainty  in  the  estimate 
of  the  corresponding  component  of  7  while  small  error  variances  imply  that  greater  confidence  may 
be  placed  in  the  estimate. 

While  the  information  contained  in  P  is  certainly  important  for  evaluating  the  absolute  level 
of  uncertainty  associated  with  the  estimator,  in  many  cases,  it  is  more  useful  to  imderstand  how 
data  serves  to  reduce  tmcertainty  relative  to  some  reference  level.  That  is,  we  have  some  prior  level 

of  confidence  in  our  knowledge  of  7  and  we  seek  to  comprehend  how  the  inclusion  of  additional 

data  in  our  estimate  of  7  alters  our  uncertainty  relative  to  this  already  established  level.  In  this 
section  we  define  the  relative  error  covariance  matrix  (RECM)  and  demonstrate  its  utility  as  a  tool 
for  capturing  such  changes  in  imcertainty.  The  analysis  of  the  RECM  in  the  wavelet  domain  is 
especially  interesting  because  it  allows  for  a  localized  characterization  of  the  maimer  in  which  data 
impacts  a  reconstruction.  Hence,  we  show  how  the  RECM  provides  a  natural  means  of  evaluating 
the  appropriate  level  of  detail  as  a  function  of  position  which  can  be  supported  in  a  reconstruction 
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based  upon  a  given  set  of  data.  When  miiltiple  measurement  processes  provide  data,  the  relative 
error  covariance  matrix  is  useful  for  determining  those  scales  and  shifts  for  which  there  exists 
significant  incrementzil  benefit  from  the  addition  of  data  from  a  given  suite  of  observations  to  an 
estimate  based  upon  information  from  a  different  set  of  sources.  Finally,  analysis  of  the  RECM 
leads  directly  to  a  quantitative,  multiscale  theory  of  sensor  fusion. 

The  definition  of  the  relative  covariance  matrix  is  motivated  by  the  definition  of  the  relative 
difference  between  two  scalars  a  and  6  given  by 


The  matrix  Einalog  to  (20)  to  be  considered  in  this  paper  is 

n(A,5)  =  7- (21) 

where  Fa  is  assumed  to  be  positive  definite.  Here  A  and  B  axe  index  sets  with  A,B  C  {1,2,...,  K}. 
The  quantity  Pa  (resp.  Pb)  is  the  error  covariance  matrix  associated  with  the  MAP  estimate  7(A) 
(resp.  7(5))  where  7(A)  (resp.  7(H))  is  the  estimate  of  7  based  upon  data  from  aU  observation 
processes  rji  with  i  £  A  (resp.  i  G  B.)  Finally,  we  define  the  error  covariance  matrix  associated 
with  no  observations,  P{0},  as  the  prior  covariance  matrix  Pq. 

The  definition  of  n(A,  B)  in  (21)  possesses  many  pleasing  properties.  First,  like  an  error  co- 
variance  matrix,  it  is  symmetric.  Also  n(A,  B)  is  the  wavelet  trzinsform  of  the  varieince  reduction 
matrix  associated  with  Pa  and  P^.  That  is, 

n(A,p)  =  7-p;’'/^PBP;^^^ 

=  w^n(A,  B)W 

Moreover,  it  is  not  difficult  to  show  that  n(A,  B)  is  normalized  to  the  extent  that  for  A  C  B, 

0  <  n(A,P)  <  I. 

We  note  that  n(A,  P)  =  0  iff  Pb  =  Pa  which  indicates  no  reduction  in  imcertainty  and  the  complete 
lack  of  additional  information  from  the  data  in  B  relative  to  that  in  A.  Alternatively,  given  some 
nonzero  level  of  uncertainty  in  7(A),  n(A,  P)  =  7  if  and  only  if  Pb  =  0  which  occurs  if  and  only  if 
7  =  7.  Thus  n(A,  P)  is  the  identity  only  when  all  uncertcdnty  in  7  has  been  removed. 

In  the  event  Pa  is  diagonal,  the  diagonal  components  of  n(A,  P)  are  particularly  easy  to  in¬ 
terpret.  Let  (Ti{A)  be  the  error- variance  of  the  component  of  7  arising  from  an  estimate  based 
upon  data  from  set  A.  Then,  the  component  of  the  diagonal  of  n(A,  P)  is  just 

which  is  nothing  more  than  the  relative  size  difference  of  the  error- variance  in  the  component  of 
7  based  upon  data  from  sets  A  and  P.  Note  that  the  diagonal  condition  of  Pa  is  met  in  this  paper 
when  Pa  =  Pq,  since  the  wavelet  and  scahng  coefficients  in  (2)  are  uncorrelated  for  the  fractal  1// 
priors  used  here  as  well  as  for  many  other  physically  meaningful  prior  models.  Thus,  the  diagonal 
elements  of  n({0},  P)  represent  the  decrease  in  uncertcdnly  due  to  the  data  from  set  P  relative  to 
the  prior  model.  Finally,  as  n({0},  P)  will  be  of  interest  frequently  in  the  remainder  of  this  work, 
we  shall  abuse  notation  and  write  n({0},  P)  as  n(P)  in  cases  when  there  will  be  be  no  confusion. 

The  quantity  II(A,  P)  represents  a  useful  tool  for  quantitatively  analyzing  the  relationship 
between  the  characteristics  of  the  data  (as  defined  by  0  and  R)  and  the  structure  of  the  estimate 
7.  In  the  examples  provided  in  Section  4,  we  utilize  n(A,  P)  to  explore 


Submitted  to  Applied  and  Computational  Harmonic  Analysis 


14 


1.  The  information  contributed  by  a  single  sensor  relative  to  that  in  the  prior  model. 

2.  The  maimer  in  which  data  from  a  group  of  sensors  is  fused  in  forming  7. 

3.  The  incremental  benefits  associated  with  the  addition  of  data  from  the  (z  +  1)*‘  sensor  to  an 
estimate  based  upon  the  first  i  measurements. 

4.  The  quality  of  estimates  at  different  scales  and  the  scales  at  which  active  fusion  takes  place  in 
that  the  relative  error  covariance  achieved  using  more  than  one  sensor  is  significantly  reduced 
compared  to  that  using  any  single  sensor  by  itself. 

Consider,  for  example,  the  case  in  which  we  wish  to  assess  the  overall  value  of  a  set  of  sensors. 
That  is,  suppose  that  A  =  0  and  B  —  {any  set  of  sensors}  so  that  J1(A,B)  =  T1{B)  measures 
the  contribution  of  the  information  provided  by  this  set  of  sensors  relative  to  that  of  the  prior 
model.  We  begin  by  defining  n”(5)  as  the  value  of  the  element  on  the  diagonal  of  the  matrix 
n(H)  corresponding  to  the  wavelet  coefficient  at  scale/shift  (m,  n)'*.  As  Fq  is  diagonal, 
is  interpreted  as  the  relative  decrease  in  the  error  veiriiince  associated  with  the  component  in  the 
wavelet  transform  of  g  at  scale/shift  (m,  n).  H  n™(H)  is  large  then  the  data  provides  considerable 
information  regarding  the  structure  of  g  at  (m,  n).  In  particular,  this  quantity  provides  us  with 
a  natural  way  in  which  to  define  the  scede  at  which  g  should  be  reconstructed  at  each  location. 
Specifically,  consider  the  finest  scale  of  our  representation,  namely,  the  scaling  coefficients  g{Mg,j). 
At  each  point  j  we  can  examine  the  quality  of  the  information  provided  at  this  point  at  the  finest 
scale  and  at  all  coarser  scale  “ancestors”  of  j.  Using  the  terminology  introduced  in  Section  2.2,  we 
say  that  the  data  supports  a  reconstruction  of  g{Mg,j)  at  scale  m  if  there  exists  some  node  in  the 
wavelet  lattice  of  g  at  scale  m  which  satisfies  the  following 

1.  The  node  impacts  g{Mg,j)  (i.e.  for  some  shift  n,  g{Mg,j)  E  V{m,n))  so  that  (m,n)  is  an 
ancestor  of  {Mg,j). 

2.  The  data  provides  a  sufficiently  large  quantity  of  information  regarding  the  structure  of  g  at 
node  (m,n)  (i.e.  n™(5)  is  in  some  sense  large). 

Clearly,  the  finest  level  of  detciil  supported  by  a  data  set  is  the  finest  scale  for  which  a  node  (m,  n) 
may  be  foimd  that  satisfies  the  above  two  criteria  and  in  general  is  a  function  of  position  (i.e.  a 
function  of  the  shift  j  at  scale  Mg.)  The  precise  quantification  of  “sufficiently  large”  will  depend 
upon  the  particular  application  and  on  the  structure  of  the  particular  inverse  problems  under 
investigation. 

In  addition  to  its  use  is  assessing  the  scale  of  reconstruction  supported  by  the  information  from 
a  set  of  sensors,  if  we  consider  the  case  where  neither  A  nor  B  is  empty,  we  find  that  there  are 
several  ways  in  which  n(A,  H)  may  be  of  use  in  assessing  the  value  of  fusing  information  from 
multiple  sensors  and  in  identifying  how  this  fusion  takes  place.  For  example,  if  A  C  5,  then 
n(A,  B)  provides  us  with  a  measure  of  the  value  of  augmenting  sensor  set  A  to  form  sensor  set 
B.  Roughly  speaking,  if  n(A,  B)  is  significantly  larger  than  0,  there  is  a  benefit  in  the  additional 
information  provided  by  the  sensors  in  R  —  A.  Moreover,  if  we  define  n”(A,  B)  as  before  as  the 
diagonal  elements  of  n(A,  B)  corresponding  to  the  (m,  n)  wavelet  coefficient,  then  we  can  use  these 

‘At  scale  mj=  Lg,  we  are  interested  in  both  the  wavelet  and  scaling  coefficients  of  g.  To  avoid  ambiguity,  we  use 
the  notation  Hn’  to  refer  to  the  RECM  information  for  the  coarsest  scaling  coefficient  of  g  at  shift  n. 
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quantities  to  pinpoint  the  scales  and  locations  at  which  this  fusion  has  significant  benefit®  i.e.,  those 
scales  and  shifts  at  which  active  sensor  fusion  is  taking  place.  Furthermore,  by  varying  the  sets 
A  and  B,  we  can  identify  not  only  the  optimal  scale  for  reconstruction  at  each  point  but  can  also 
identify  which  sensors  are  actively  used  to  obtain  that  estimate.  That  is,  for  each  (m,n),  we  can 
in  principal  find  the  set  A  C  {1,  A}  so  that  {1,  . . . ,  K})  is  small  (so  that  sensors  not 

in  A  provide  little  additional  information  to  the  reconstruction  of  wavelet  coefficient  (m,  n))  and 
so  that  for  any  C  C  A,  n^(C,  .4)  is  of  significant  size  (so  that  aU  of  the  sensors  actively  contribute 
to  the  reconstruction  at  this  scale  and  shift.) 
ref 


4  Examples 

The  vehicle  for  illustrating  the  MAP  estimator  and  associated  analysis  techniques  developed  in 
Section  3  is  a  two  chaimel  deconvolution  problem  configured  in  several  ways  to  illustrate  a  variety 
of  different  facets  of  our  approach.  The  function  to  be  reconstructed  is  assumed  to  be  a  1//  type 
of  process  defined  by  the  parameters  in  Table  2  and  the  particular  sample  path  of  this  process  used 
in  our  examples  is  displayed  in  Figure  4. 

The  convolutional  nature  of  the  problem  implies  that  Ti{x,  x')  =  Ti{x  —  x')  =  Ti(^)  for  i  =  f,c. 
The  two  kernels  used  in  the  exeimples  here  are  plotted  in  Figure  1  and  the  operator  matrices  Tf 
and  Tc  are  shown  in  Figure  3.  The  output  of  the  sensor  corresponding  to  Tf  provides  relatively  fine 
scale  information  about  g  in  comparison  to  that  provided  by  the  sensor  corresponding  to  since 
much  of  the  fine  scale  variation  in  g  is  removed  under  the  averaging  action  of  this  operator. 

The  ability  of  the  wavelet  to  compress  the  information  in  these  operators  is  illustrated  in  Figure 
5.  Because  the  wavelet  transform  is  orthonormal,  the  energy  in  Tj  and  ,  is  the  same  for  i  G  {/,  c} 
(i.e.  WTiWp  =  ||0i|iF  where  ![  •  ||i^  is  the  Frobenius  norm);  however,  this  energy  is  concentrated 
in  fewer  entries  in  the  wavelet  domain  operators  than  in  their  space  domain  counterparts.  To 
illustrate  this  property,  define  the  quantity  Aj(n)  (resp.  Hi(n))  as  the  energy  in  the  first  n  largest 
(in  magnitude)  components  of  T,  (resp.  0^).  Further,  assume  that  Ei{n)  and  Ei(n)  are  normalized 
by  the  total  energy  in  the  respective  operators.  In  the  case  of  the  two  operators  considered  here, 
we  plot  Ef{n)  and  H^(n)  in  Figure  5(a)  and  Ec{n)  and  Sc(n)  in  Figure  5(b).  Note  that  as  with  the 
operators  considered  by  Beylkin  et.  al  in  [4],  for  both  operators  considered  here,  any  given  level 
of  energy  is  cont^uned  in  far  fewer  coefficients  in  the  transform  domain  than  in  the  physical  space 
domain.  In  fact,  to  captme  95%  of  the  energy  in  Tf  requires  2150  elements  while  only  712  need 
be  retained  in  0^;  a  factor  of  three  difference.  In  the  case  of  T,,,  roughly  14,000  components  are 
required  to  retain  95%  of  the  energy  while  only  149  elements  are  needed  for  0^  which  is  savings 
of  almost  two  orders  of  magnitude.  This  suggests  that  the  transform  domain  matrices  may  be 
well  approximated  by  sparse  matrices  obtained  by  setting  their  negligible  components  to  zero  so 
that  computationally  efficient,  sparse  matrix  routines  can  be  used  to  solve  the  normal  equations. 
We  note  that  the  use  of  higher  order  wavelets  would  result  in  even  sparser  0j  and  that  a  detailed 
analysis  of  computationally  efficient,  multiscale  inversion  algorithms  is  presented  in  [43]. 

^In  this  case,  because  Pa  is  not  in  general  diagonal,  the  diagonal  elements  of  n(A,B)  do  not  have  the  exact 
interpretation  as  the  relative  size  difference  of  the  error  variance  of  7  based  upon  data  from  A  and  B\  however  the 
size  of  these  diagonal  components  of  11(^4,  jB)  still  lends  insight  as  to  the  scales  and  shifts  where  the  observations 
from  set  B  provide  information  not  found  in  the  data  from  set  A. 
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Property 

Value 

Wavelet 

Daubeclues  6-tap 

Finest  scale  {Mg) 

7 

Coarsest  Scale  {Lg) 

3 

2.0 

<7^ 

10 

Vl, 

0.25 

Table  2:  Parameter  values  for  g 


Figure  4:  Fractal  function  to  be  reconstructed.  Approximation  coefficients  at  scale  Mg  =  7. 
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(a)  Ef{n)  (Solid  line)  and  -/(n)  (dashed  line)  (b)  Ec{n)  (Solid  line)  and  Hc(7i)  (dashed  line) 


Figure  5;  Plots  of  normalized  energy  in  the  largest  n  component  of  Ti  and  0^  as  a  function  of 
n.  Note  that  for  both  the  fine  and  coarse  scale  operators,  energy  is  more  concentrated  in  the 
transform  domedn  than  in  the  space  domain  in  that  any  given  level  of  energy  is  contained  in  far 
fewer  coefficients  in  0,  than  in  the  corresponding  Tj. 

4.1  The  Full  Data  Case:  Equal  SNRs 

As  a  first  example,  we  consider  the  case  where  a  full  set  of  data  is  available  from  both  sensors  and 
the  signal  to  noise  ratio  of  each  observation  is  the  same  and  equal  to  1.  In  this  work,  the  signal  to 
noise  ratio  of  the  vector  r)i  =  0^7  +  i/j  with  i/j  ~  A/'(0,  r]I)  and  7  ~  A/*(0,  Pq)  is  defined  as 

SNR}  -  ^<*7  _  t'’'{QiPoQf) 

’  Power  per  pixel  in  Ui  ^g'^i 

where  Ng  is  the  length  of  the  vector  7  and  tr  is  the  trace  operation.  The  noiseless  and  noisy  data 
sets  are  shown  in  Figure  6.  In  Figure  7(a),  g{{f,c})  is  graphed  against  g  while  Figs.  7(b)  and  7(c) 
display  g{{f,c})  vs.  g{{f})  and  ff({c})  respectively.  These  plots  demonstrate  that  given  data  of 
equal  quality  (i.e.  equal  SNR’s),  the  MAP  estimator  bases  the  overall  reconstruction  primarily  on 
the  fine  scale  data  source  yf.  In  Figure  7(d),  we  compare  two  versions  of  g.  The  solid  line  is  a 
graph  of  g  in  which  all  coefficients,  7(m),  are  used  at  all  scales  in  forming  g{Mg)  while  the  dashed 
line  is  a  reconstruction  in  which  7(771)  for  tti  >  4  are  set  to  zero.  This  picture  indicates  that  and 
yf  convey  no  useful  information  regarding  g  at  scales  finer  than  4. 

Analysis  of  the  relative  error  covariance  matrices  provide  much  additional  insight  into  the 
maimer  in  which  the  data  axe  used  to  form  g.  Due  to  the  full  data  condition  and  the  fact  that  Pq 
is  a  function  oidy  of  scale,  the  RECM  information  is  basically  a  function  only  of  scale  and  does  not 
V3iry  considerably  from  shift  to  shift  over  any  given  scale.  Thus  we  define  n’”(A,  B)  as  the  average 
vaJue  of  n™(A,  B)  taken  over  all  shifts  n  at  scale  m.  In  Table  3,  the  values  of  n’”({/,  c}),  n’”({/}), 
and  n’”({c})  are  given  in  percent  for  all  m  defined  in  the  wavelet  transform  of  g.  Hence  the  first 
column  indicates  the  percent  reduction  in  variance  as  a  function  of  scale  for  an  inversion  based 
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upon  Uf  and  where  this  reduction  is  taken  relative  to  the  prior  model.  Similar  interpretations 
hold  for  the  second  and  third  columns.  The  last  column  in  Table  3  is  the  average  value  at  each 
scale  of  the  RECM  obtained  when  the  coarse  scale  data,  t/c,  is  added  to  an  inversion  based  upon 
Uf.  Finally,  note  that  last  row  of  this  table  provides  the  RECM  information  associated  with  the 
estimates  of  the  coarsest  scaling  coefficients  of  g. 

Comparison  of  the  data  in  the  first  three  columns  indicates  that,  given  both  sets  of  data,  the 
bulk  of  the  variance  reduction  is  attributable  to  the  information  present  in  j//.  Moreover,  the 
information  in  the  observations  at  scales  5,  6,  and  7  is  negligible.  In  the  first  column  of  Table  3 
(where  both  and  y^  are  used  in  the  inversion)  we  see  a  20%  and  63%  variance  reduction  in  the 
estimates  7(4)  and  7(3)  respectively  and  a  98%  reduction  in  the  estimates  of  the  coarsest  scaling 
coefficients,  y(3).  In  the  second  column  (where  only  y^  is  used  to  determine  7),  similar  RECM 
data  is  present.  Prom  column  three  of  Table  3  (where  oxdy  y^  is  used),  we  conclude  that  the  noisy, 
coarse  scale  data  is  useful  only  in  reducing  the  variance  for  the  components  of  7  at  scale  3.  Lastly, 
column  four  shows  that  the  addition  of  the  coarse  scale  data  to  an  estimate  based  upon  y^  only 
provides  incremental  benefit  in  the  estimates  of  y(3). 

Prom  this  analysis,  we  observe  that  there  is  no  sensor  fusion  taking  place  in  an  estimate  based 
upon  both  yf  and  y^.  That  is,  tmder  this  particuiair  full  data,  equal  SNR  scenario,  the  information 
in  yc  is  largely  ignored  in  constructing  y({/,  c}).  The  data  in  Table  3  also  implies  that  there  is  a 
limit  to  the  level  of  detail  supported  in  a  reconstruction  of  g  based  upon  yj.  In  fact,  the  values  of 
II"'  are  considerably  smaller  at  the  finer  scales  (5,  6,  and  7)  than  at  the  coarser  scales  (3  cind  4). 
Prom  this,  we  conclude  that  neither  set  of  data  alone  or  together  provides  sufficient  information 
for  the  reconstruction  of  detail  in  g  finer  than  that  found  at  scale  4. 

We  note  that  the  information  provided  by  the  relative  error  covariance  matrices  is  consistent 
with  the  actual  estimates  graphed  in  Fig.  7  where  we  saw  that  y({/,  c})  essentially  is  the  same  as 
y({c}),  and  that  g{{f,  c})  does  in  fact  contain  little  detail  at  scales  finer  than  four.  The  use  of  the 
RECM  is  significant  because  it  allows  for  the  formulation  of  these  conclusions  before  any  data  are 
obtained.  Thus,  the  RECM  represents  a  useful  tool  for  the  design  and  evaluation  of  experiments 
where  multiple  sensors  are  to  be  used  in  the  recovery  of  some  underlying  quantity.  In  this  example, 
one  would  conclude  that  the  coarse  scale  sensor  is  of  little  or  no  use  in  the  recovery  of  g  and  that 
additional  observation  processes  are  required  to  resolve  very  fine  scale  structure  in  g. 

Additionally,  the  relative  error  covariance  matrix  analysis  can  be  used  to  evaluate  a  pairticular 
parameterization  of  g.  Given  the  structure  of  the  observation  processes,  we  see  that  g  is  overpa¬ 
rameterized  as  the  data  provide  little  useful  fine  scale  information  relative  to  that  found  in  the 
prior  model.  Any  attempt  to  recover  these  components  of  g  is  effectively  a  waste  of  time  and 
computationcd  resources.  Rather,  the  RECM  suggests  that  a  more  parsimonious  description  of  g  is 
wsuranted  and  even  indicates  how  such  a  model  should  be  constructed  based  upon  the  information 
available  in  the  data.  That  is,  given  the  structure  of  the  observation  processes,  the  original  pa- 
rauneterization  of  g  involving  256  degrees  of  freedom  is  clearly  excessive.  Rather,  the  data  dictates 
that  at  most  only  32  parameters  (the  coarse  scaling  coefficients  and  the  detail  coefficients  at  scales 
3  and  4)  can  be  accurately  recovered. 

4.2  The  Full  Data  Case:  Unequal  SNRs 

As  a  second  example,  consider  the  case  where  again  fuU  data  is  provided  for  both  observation  pro¬ 
cesses,  but  the  level  of  noise  in  y/  is  much  greater  than  that  of  y^.  Here  we  take  the  SNRc  =  i  while 
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6| - - - ^  6, 


(a)  Noiseless  (solid  line)  and  noisy  (b)  Noiseless  (solid  line)  and  noisy 

(dashed  line)  versions  oft//.  SNRf  =  1  (dashed  line)  versions  of  j/c-  SNRc  =  1 


Figure  6:  Data  sets  for  use  in  full  data  reconstruction  with  the  SNRf  =  SNRc  =  1 


Scale  m 

ioo*n'"({/,c}) 

100*n"‘({/}) 

100  *n'”({c}) 

100.n”*({/}{/,c}) 

7 

0.0048 

0.0047 

0.0001 

0.0001 

6 

0.0622 

0.0600 

0.0020 

0.0023 

5 

1.2246 

1.1785 

0.0475 

0.0496 

4 

19.0872 

18.4934 

0.9166 

0.7705 

3 

62.7417 

60.5813 

10.9863 

5.7320 

3 

98.1754 

96.7171 

90.8045 

45.8975 

Table  3:  Percent  relative  error  variance  reduction  for  full  data  inversion  with  SNRf  =  SNRc  =  1- 
Comparison  of  the  first  through  third  columns  indicates  that  the  fine  scale  data  provides  most  of 
the  variance  reduction.  The  fourth  column  demonstrates  the  the  incremental  information  provided 
by  the  coarse  scale  observation  process  is  seen  primsirily  in  the  estimates  of  the  coarsest  scaling 
coefficients. 
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50  ’00  -^0  200  250  50  100  150  200  250 


(a)  3  (solid  line)  versus  j({/,  c})  (dashed  (b)  g{{f,c})  (solid  line)  versus  g{{f}) 

line)  (dashed  line) 


(c)  g{{f,c})  (solid  line)  versus  g{{c})  (d)  j({/,  c})  constructed  using  detsdl  at 

(dashed  Ene)  sJl  scales  (solid  line)  versus  s({/ic}) 

comprised  of  only  3(3),  7(8),  and  7(4) 

(dashed  line) 

Figure  7:  Estimates  of  g  using  various  combinations  of  fine  and  coarse  scale  data  for  the  equal  SNR 
experiment.  Prom  (b)  and  (c)  we  observe  that  given  both  sets  of  equally  noisy  data,  the  estimator 
uses  primarily  the  information  from  the  process  yf.  In  (d),  5  is  reconstructed  ignoring  any  detail 
estimates,  7(m),  at  scales  finer  than  4  and  compared  to  the  estimate  g  in  which  cdl  available  detail 
is  used.  In  this  case  we  observe  that  yj  and  y^  provide  little  useful  information  at  scales  5  through 
7. 
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SNRf  =  1.  Inversion  problems  with  these  characteristics  arise  quite  frequently  in  practice.  For 
example,  in  geophysical  prospecting,  the  fine  scale  process  may  arise  from  an  electrical  measure¬ 
ment  using  high  frequency  electromagnetic  fields  to  probe  the  structure  of  the  earth.  These  fields 
tend  to  suffer  attenuation  due  to  the  lossy  characteristics  of  the  medium  giving  rise  to  low  signal  to 
noise  ratios.  Alternatively,  the  coarse  scale  observation  processes  are  associated  with  low  frequency 
observations  for  which  either  attenuation  is  small  or  energy  is  high  resulting  in  higher  SNR.  The 
function  g  to  be  recovered  is  the  same  as  in  the  first  example  and  the  estimates  themselves  are 
shown  in  Fig.  8.  As  in  the  previous  case,  it  is  clear  just  from  these  plots  that  very  fine  scale  detail 
is  not  supported  by  these  data  sets;  however,  it  is  less  obvious  as  to  the  manner  in  which  data  from 
each  set  contributes  to  the  overall  reconstruction. 

Consider  the  RECM  information  in  Table  4.  As  with  the  previous  case,  the  structure  of  the  prior 
model  and  the  measurements  processes  imply  that  little  is  lost  in  examining  averages  of  RECM 
components  over  all  shifts  at  a  given  scale.  Prom  the  data  in  the  last  row  of  Table  4  it  is  clear  that 
for  the  coarsest  scaling  coefficients,  both  yf  and  yc  provide  comparable  and  close  to  full  information 
relative  to  that  of  the  prior  model.  For  the  estimates  of  the  wavelet  coefficients  at  scales  3  and 
4,  we  see  a  significant  amoimt  of  sensor  fusion  taking  place.  In  particular,  at  scale  3,  the  use  of 
yf  (resp.  yc)  alone  provides  a  variance  reduction  of  about  60%  (resp.  59%);  however,  given  both 
sets  of  data,  this  statistic  jumps  to  75%.  Thus,  the  ability  to  resolve  the  wavelet  coefficients  of 
g  at  scale  3  is  significantly  improved  when  both  set  of  data  are  available  to  the  inversion  tham  is 
the  case  when  either  acts  alone.  A  similar  argument  holds  for  the  information  contained  in  the 
observations  regarding  the  structure  of  g  at  scale  4.  Table  4  indicates  that  fusion  also  occurs  at 
scale  five  although  the  data  at  this  scale  is  obviously  less  reliable  than  at  coarser  scale.  It  is  clear 
that  neither  data  source  provides  significant  information  at  the  finest  scales:  6  and  7. 

Unlike  the  full  data,  equal  SNR  example  in  Section  4.1,  the  RECM  here  provides  significant 
information  not  readily  obtained  by  examination  of  only  the  estimates.  Specifically,  we  are  able  to 
pinpoint  exactly  where  in  scale  active  space  sensor  fusion  is  occurring  and  quantify  its  magnitude. 
Moreover,  our  analysis  is  of  great  use  in  capturing  the  effects  of  noise  on  the  level  of  detail  supported 
by  a  given  source  of  data.  Comparing  the  results  of  this  experiment  with  those  of  the  preceding 
section,  we  see  from  the  fourth  columns  of  Tables  3  and  4  that  the  higher  SNRc  alters  w'here  in 
scale  space  yc  contributes  information  relative  to  that  foimd  in  .  In  Section  4.1,  the  coarse  scale 
process  contributes  only  to  the  estimates  of  the  coarsest  scaling  coefficients  while  in  this  case,  yc 
provides  additional  information  regarding  y(3)  and  the  wavelet  coefficients  at  scale  3  (and  to  a 
lesser  extent  the  wavelet  coefficients  at  scale  4.) 

4.3  The  Incomplete  Data  Case:  Boundary  Measurements 

A  common  characteristic  of  linear  inverse  problems  is  the  desire  to  estimate  g  over  some  closed 
and  bounded  region  based  upon  measurements  some  of  which  sire  available  only  at  or  near  the 
boundary  of  this  region  [5,14,20,21,33,38].  Such  a  situation  may  arise,  for  example,  in  a  geophysical 
setting.  Here  one  may  be  interested  in  ascertaining  the  conductivity  structure  or  acoustic  properties 
of  a  rock  formation  given  electromagnetic  data  which  provide  fine  scale  information  only  near  a 
few  boreholes,  together  with  coaxser-resolution,  sonic  data  (e.g.  from  ground- penetrating  radar 
or  surface  seismic  surveys)  which  in  contrast  have  full  coverages  over  the  entire  interweU  region. 
This  type  of  observation  configmation  leads  to  both  theoretical  as  well  as  computational  difficulties. 
From  a  theoretical  perspective,  problems  of  this  class  tend  to  be  extremely  iU-posed  in  that  solutions 
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0.61- 


50  too  •‘>0  *00  250 


(a)  g  (solid  line)  versus  j({/,  c})  (dashed 
line) 


(c)  #({/,  c})  (solid  line)  versus  j({c}) 
(dashed  line) 


(b)  ff({/,  c})  (soEd  line)  versus  s({/}) 
(dashed  line) 


(d)  g{{f,c})  constructed  using  detail  at 
all  scales  (solid  Ene)  versus  g{{f,c}) 
comprised  of  only  p(3),  7(3),  and  7(4) 
(dashed  Ene) 


Figure  8:  Estimates  of  g  using  vtirious  combinations  of  fine  and  coarse  scale  data  for  the  unequal 
SNR  experiment.  Prom  (b)  and  (c)  we  observe  that  some  form  of  active  sensor  fusion  is  taking 
place  as  the  estimate  given  both  sets  of  data  is  clearly  different  from  that  obtained  when  either 
data  set  is  used  alone.  In  (d),  g  is  reconstructed  ignoring  any  detail  estimates,  7(m),  at  scales  finer 
than  4  and  compared  to  the  estimate  g  in  which  all  available  detail  is  used  from  which  we  observe 
that  yf  and  provide  little  useful  information  at  scales  5  through  7. 
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Scale  m 

ioo*n-({/,c}) 

100*  n”‘({/}) 

ioo*n”‘({c}) 

ioo*n”‘({/}{/,c}) 

7 

0.0057 

0.0047 

0.0010 

0.0011 

6 

0.0871 

0.0600 

0.0267 

0.0279 

5 

1.7835 

1.1785 

0.6457 

0.6431 

4 

25.3244 

18.4934 

10.1778 

8.7822 

3 

75,9424 

60.5813 

59.1247 

39.6413 

3 

99.4718 

96.7171 

98.9946 

84.8110 

Table  4:  Percent  relative  error  variance  reduction  for  full  data  inversion  with  SNRf  =  1  and 
SNRc  —  4.  Unlike  the  first  example,  the  high  quality,  coarse  scale  data  now  provides  significant 
information  to  the  inversion.  From  the  first  three  columns,  the  bold  faced  values  indicate  where 
active  sensor  fusion  taking  place.  Specifically,  at  scales  3  and  4  the  percent  variance  reduction  is 
significantly  higher  given  both  sets  of  data  than  is  the  case  when  either  Tjf  or  j/c  is  used  alone.  The 
fourth  column  shows  that  the  incremental  information  provided  by  the  coarse  scale  observation 
process  is  seen  at  the  coarsest  two  scales. 


to  these  inverse  problems  are  very  sensitive  to  perturbations  in  the  data.  Upon  linearization, 
these  theoretical  difficulties  are  reflected  in  discretized  linear  systems  with  very  high  condition 
numbers  so  that  regularization  is  required.  Additionally,  as  discussed  in  Section  1  for  problems 
with  a  convolutional  structure,  the  sparse  and  “gappy”  distribution  of  data  points  makes  the  use 
of  Fourier-based  techniques  problematic. 

In  contrast,  the  multiscale,  statistical  MAP  inversion  algorithm  we  have  described  is  ideally 
suited  to  handling  such  problems.  To  illustrate  this,  we  consider  a  variation  on  the  two  channel 
deconvolution  problem  with  S N R^  =  S N R^  =  3;  however,  we  assume  that  yf  is  available  only  near 
both  ends  of  the  interval.  In  this  case,  the  data  sets  are  shown  in  Figure  9.  In  solving  the  inverse 
problems,  regularization  is  provided  by  the  prior  model  as  discussed  in  Section  3.2.  Moreover,  this 
sampling  structure  is  handled  quite  easily  using  wavelet  transforms.  Specifically,  we  spht  yf  into 
its  left  and  right  components,  and  yf^,  and  treat  each  separately.  In  effect,  this  is  equivalent 
to  windowing  yf  and  applying  W/  individually  to  each  windowed  version  of  the  data.  We  note 
that  unlike  Fourier  techniques  where  space-domain  windowing  can  cause  significant  distortion  of 
the  signal  in  the  frequency  domain,  no  significant  distortion  is  present  here®. 

The  estimates  of  g  axe  displayed  in  Figure  10.  We  see  that  over  the  middle  of  the  interval, 
g{{f,c})  is  roughly  the  same  as  y({c})  while  at  either  end,  information  from  yf  is  used  almost 
exclusively  in  the  inversion.  Additionally,  Figure  10  shows  that  given  only  yf ,  the  estimator  does 
make  an  attempt  to  recover  g  over  the  interior  of  the  interval,  but  such  an  estimate  is  increasingly 
in  error  the  farther  one  proceeds  toward  the  middle. 

In  Figure  ll(a)-(d),  the  diagonal  components  of  11(5)  are  plotted  for  B  C  {{/},{c},{/,c}} 
and  for  scales^  3  and  4.  We  observe  that  for  scale-shift  pairs  (m,  n)  interior  to  the  boundary  region 

®Thc  only  distortion  is  caused  by  the  edge  effects  arising  from  the  circuiant  implementation  of  the  wavelet  transform 
as  discussed  in  Section  2.2  and  as  we  have  discussed,  these  effects  are  generally  neghgible  or  can  be  overcome 
completely  through  the  use  of  modified  wavelet  transforms  obtained  over  compact  intervals 

^The  unusual  activity  at  the  right  hand  edge  of  these  plots  is  an  artifact  of  the  circuiant  implementations  of  the 
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in  which  fine  scale  data  are  available,  n™({/})  is  essentially  zero  indicating  the  almost  complete 
lack  of  information  in  j//  about  g  over  these  shifts.  However,  for  pairs  (m,  n)  corresponding  to 
locations  near  either  boundary,  the  story  is  different.  Here,  information  in  r/f  almost  completely 
dominates  that  in  i/c  as  was  the  case  in  the  first  example.  InJFigures  11(d),  the  utility  of  adding 
to  an  estimate  based  upon  yj  is  illustrated  by  displaying  n^({/},  {/>  c}).  Again  the  contribution 
of  the  coajse  sczde  data  is  greatest  away  from  the  end  of  the  interval.  In  Figures  11(a)  and  (b),  we 
observed  the  presence  of  active  sensor  fusion  over  selected  shifts  at  these  scale.  That  is  for  certain 
n  and  for  j  e  {3,4},  n{({/,  c})  is  significantly  larger  that  both  n{({c})  and  n^({/}).  Thus,  the 
RECM  is  able  to  localize  both  in  scale  and  in  shift  the  precise  locations  where  the  presence  of  both 
data  sets  yields  significantly  more  information  than  either  alone.  Finally,  for  scales  other  than  3 
and  4,  the  two  observation  sources  provide  little  if  any  significant  information  to  the  reconstruction 
of  g. 

Unlike  the  previous  examples  where  both  data  sets  were  available  over  the  entire  interval, 
for  the  case  considered  here,  we  are  quite  justified  in  defining  the  shift-varying  optimal  scaile  of 
reconstruction  given  both  and  yj .  As  described  in  Section  3.3,  we  say  that  a  data  set  A  supports 
a  reconstruction  of  g{Mg,n)  to  scale  m  if  there  exists  some  node  (m,  n)  such  that  (1)  g{Mg,j)  £ 
2?(m,  n)  and  (2)  n™(A)  is  sufficiently  large.  The  finest  level  of  detail  supported  in  a  reconstruction 
at  shift  j,  which  we  denote  by  Tn'{j),  is  the  finest  scale  for  which  a  node  may  be  foimd  that  satisfies 
the  above  two  conditions.  For  the  problems  considered  here,  the  diagonal  structure  of  Pq  implies 
that  0  <  n™(A)  <  1  so  that  determining  whether  n”(A)  is  “sufficiently  large”  is  accomplished  by 
comparing  this  quantity  to  some  threshold,  r,  between  zero  and  one.  This  procedure  for  determining 
the  optimal  scale  of  reconstruction  imphes  that  we  need  consider  only  those  nodes  in  the  wavelet 
lattice  of  g  for  which  n”(A)  >  r.  Hence,  we  are  led  to  define  7.^,  a  truncated  version  of  7,  as  follows: 


[7T](T7J,n)  — 


0  n™(A)  <  T 

[7](m,n)  Otherwise 


(22) 


where  [7](m,n)  is  the  component  in  the  vector  7  at  scale  m  and  shift  n.  Defining  7^  in  this  way 
enstnes  that  gr  =  VV’^T'r  is  in  fact  the  reconstruction  of  g  which  at  each  shift  j  contains  detail 
information  at  sczdes  no  finer  than  m'{j). 

In  Figure  12,  we  plot  the  finest  scale  supported  in  a  reconstruction  of  g  using  the  noisy  data 
sets  of  Figure  9  for  r  =  0.45.  Here  we  see  that  near  the  boundaries,  the  presence  of  fine  scale 
data  allows  for  higher  resolution  in  the  reconstruction  of  g  while  in  the  middle  of  the  interval,  we 
must  settle  for  a  coarser  estimate.  From  Figure  13  we  see  that  there  is  little  difference  between  the 
optimal  estimate,  g,  and  its  truncated  version,  jo.45-  This  provides  further  evidence  that  the  RECM 
is  the  right  tool  for  precisely  evaluating  the  maimer  in  which  the  data  contributed  information  to 
the  reconstruction  of  g.  FinaUy,  in  Figure  14,  the  finest  scale  supported  in  a  reconstruction  as  a 
function  of  both  position  and  threshold  is  displayed.  Here,  the  horizontal  axis  represents  the  shift, 
n,  at  the  finest  scale,  Mg  =  7  ,  the  vertical  axis  is  the  value  of  r,  and  the  grey  tones  represent 
the  finest  scale  of  resolution  supported  by  the  data  at  shift  n  using  threshold  r  with  darker  shades 
indicate  finer  sczdes.  Increasing  r  implies  that  we  require  more  information  from  the  data  to  say 
that  the  observations  support  reconstruction  at  finer  scales.  Hence,  for  the  problems  here,  with  r 
greater  than  about  0.7,  we  conclude  only  the  coarsest  information  in  g  may  be  recovered  given  the 
data.  For  t  less  than  0.7,  the  situation  is  much  the  same  as  was  seen  in  the  analysis  of  Figure  12 


H  and  G  filters  as  discussed  in  Section  3.3 
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(a)  Noiseless  (solid  line)  and  noisy 
(dashed  line)  versions  oft//.  SNRf  —  3 
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(b)  Noiseless  (solid  line)  and  noisy 
(dashed  line)  versions  of  yc.  SNRc  =  3 


Figure  9;  Data  sets  for  use  in  reconstruction  with  the  SNRf  =  SNRc  —  3  and  j//  available  only 
near  the  end  of  the  interval. 

with  fine  scale  detail  recoverable  near  the  boundaries  and  a  coarse  reconstruction  near  the  middle 
where  only  yc  is  present. 

4.4  The  Incomplete  Data  Case:  Coarse  Scale  Data  Sampled  Coarsely 

In  the  preceding  example,  the  cocurse  scale  data  not  only  had  complete  coverage  over  the  entire 
interval  of  interest,  but  they  also  were  available  at  the  finest  scale  of  resolution  i.e.  a  coarse 
measurement  yc  was  available  for  every  shift,  n,  at  the  finest  scale  of  om  representation.  What 
is  more  reahstic  in  practice,  of  course,  is  to  have  coarse-resolution  data  available  at  a  sampling 
interval  commensurate  with  the  resolution  of  the  data.  In  this  last  example,  we  demonstrate  that 
our  methodology  can  be  directly  appHed  to  such  problems  as  well.  In  particular,  we  consider 
basically  the  same  measurement  configuration  as  in  Sections  4.1  and  4.2  expect  in  this  case  the 
coarse-resolution  measurement  process,  y<.>  is  available  only  on  a  sparsely  sampled  grid  covering 
the  interval  of  interest.  In  particular,  for  this  example  we  assume  that  the  measurements  yc  are 
available  on  a  grid  that  is  decimated  by  a  factor  of  8  compared  to  that  used  in  the  previous  section. 
For  this  exercise,  we  also  assume  that  we  have  fine  scale  data  over  the  entire  interval  and  at  the 
original,  finer  sampUng  rate,  and  we  also  take  SNRf  =  1  and  SNRc  —  4.  Note  that  the  difference 
in  sampling  grids  for  our  two  measurement  sets  is  of  no  consequence  for  the  applicability  of  our 
methodology,  as  we  simply  use  DWT’s  appropriate  to  each.  The  substantive  difference,  of  course, 
is  that  the  smaller  number  of  measurement  point  in  j/^  has  fewer  scales  of  decomposition,  but  this 
is  automatically  accommodated  in  our  formulation. 

In  Figure  15,  g{{f})  and  g{{f,  c})  are  compared  for  this  example  as  well  as  for  the  corresponding 
case  in  which  a  full  set  of  coarse-resolution  data  (at  SNRc  =  4)  is  available  on  the  original,  dense 
saunpling  grid  (i.e.  the  case  considered  in  Section  4.2.)  Although  not  exact  matches,  the  lose  of 
information  incurred  by  the  sparse  availability  of  yc  obviously  is  not  severe.  The  RECM  data  for  this 
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(a)  s({/,c})  (solid  line)  versus  ff({/}) 
(dashed  Ene) 


(b)  ff({/,  c})  (soEd  Ene)  versus  ff({c}) 
(dashed  Ene) 


Figure  10:  Estimates  of  g  using  various  combinations  of  yf  and  for  the  case  where  SNRf  = 
SNRc  —  3  amd  yf  is  available  only  near  the  edges  of  the  interval.  We  see  that  at  the  boundaries, 
the  estimate  given  both  y^  and  yf  essentially  makes  use  only  of  yf.  Over  the  center  of  the  interval 
where  yf  is  absent,  g{{f,c})  follows  y({c})  closely. 


experiment  are  provided  in  Table  5.  It  is  useful  to  compare  this  information  with  the  corresponding 
results  for  the  example  considered  in  Section  4.2  where  we  had  the  same  SNR  structure  but  full 
data  for  both  t/c  and  yf.  At  fine  scales,  the  story  for  this  case  is  much  the  same  as  in  that  previous 
example  with  the  data  providing  little  useful  information  at  scales  5  and  finer.  At  scales  3  and  4  a 
comparison  of  Tables  5  and  4  indicate  that  the  sparse  availability  of  y^  is  reflected  in  smaller  values 
of  n"*({c})  and  n’"({/,  c}).  Prom  the  first  columns  of  these  tables  we  see  that  the  presence  of  both 
yc  emd  yf  results  in  comparable  ability  to  recover  detail  at  these  coarser  scales  regardless  of  the 
availability  of  the  coarse  data.  When  yc  is  the  only  source  of  information,  the  relative  reduction  in 
variance  drops  rather  sharply  for  the  sparse  data  scenEirio  as  is  seen  by  examining  the  third  column 
of  Tables  5  and  4. 

Roughly  speaking,  what  these  results  show  is  that  the  availability  of  a  dense  set  of  coarse-scale 
data  does  not  change  the  resolution  at  which  reconstruction  can  be  performed  but,  we  obviously 
can  perform  additional  averaging  using  these  additional  data  points,  resulting  in  enhanced  variance 
reduction  as  seen  in  Table  4.  That  is,  if  we  have  several  essentially  redrmdant  measurements  at 
an  SNR  of  4,  their  combined  effect  is  to  enhance  the  apparent  SNR  as  compared  to  the  coarsely 
sampled  case.  In  this  sense,  a  fairer  comparison  is  that  between  the  example  introduced  in  this 
section,  with  high  quality,  but  spcirsely  sampled  coarse  resolution  data  with  the  example  considered 
in  Section  4.1  which  involved  lower  quality,  but  densely  sampled  coarse  resolution  data  (in  both 
cases  full- coverage,  densely  sampled  fine  scale  data  with  SNRf  =  1  are  available).  In  particular,  by 
examining  the  values  of  n”‘({c})  in  Tables  5  and  3,  we  see  that  the  value  of  the  high  SNR,  sparse 
data  set  yc  is  about  equal  to  that  of  the  low  SNR,  full  data  set  as  measured  by  the  information  in 
the  RECM. 
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2  3  1  '  0 


(a)  Solid  lines  =  ni({/,c}).  Dashed 
lines  =  n^({/}).  Dot-dashed  lines  = 

ni({c}). 
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(c)  Solid  lines  =  ni({/,  c}).  Dashed 
lines  =  ni({/}).  Dot-dashed  lines  = 
n:L({c}). 
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(b)  Solid  lines  =  n5i({l,c}).  Dashed 
lines  =  n^({/}).  Dot-dashed  lines  = 

ni({c}). 
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(d)  ni({/},{/,c}). 


Figure  11:  Residual  error  covariance  information  for  the  case  of  SNRf  =  SNRc  ~  3  with 
available  only  near  the  ends  of  the  interval.  For  scales  3  and  4,  (a)-(c)  indicate  that  at  the  ends  of 
the  interval,  the  varitince  reduction  given  both  r/f  and  is  equal  to  that  given  only  y^ .  Alternatively, 
Tjc  impacts  the  RECM  data  primarily  in  the  middle  of  the  interval.  In  (a)-(c),  there  is  some  active 
sensor  fusion  taking  place  as  there  exists  shifts  at  these  scales  for  which  H®  ({/,  c})  dominates  both 
n^({/})  n®({c}).  From  (d),  it  is  observed  that  y^  has  significant  impact  relative  to  y^  in 

lowering  the  variance  of  the  coarsest  scaling  coefficient  estimates  at  shifts  away  from  either  end  of 
the  interval. 
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Figure  12:  The  space- varying,  optimal  scale  of  reconstruction  for  r  =  0.45  given  (1)  the  complete 
set  of  data  and  (2)  the  fine  scale  data  yf  near  either  end  of  the  interval 


Figure  13:  Plot  of  g  (solid  line)  versus  ^0.45  (dashed  line) 
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Figure  14;  Space-varying  optimal  scale  of  reconstruction  as  a  function  of  r.  The  horizontal  axis 
represents  the  shift  n  at  the  finest  sccde,  Mg  =  7  ,  the  verticEd  axis  is  the  value  of  r,  and  the  grey 
tones  represent  the  finest  scale  of  resolution  supported  by  the  data  at  shift  n  using  threshold  r. 
Darker  colors  indicate  finer  s Cedes. 
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(a)  ff({/,c})  for  full  data  case  (solid  line) 
and  case  where  yc  is  available  on  a  sparse 
grid  (dashed  line) 


(b)  s({c})  for  full  data  case  (solid  line) 
and  case  where  yc  is  available  on  a  sparse 
grid  (dashed  line) 


Figure  15:  Estimates  of  g  using  various  combinations  of  data  sets  for  the  decimated  data  experi¬ 
ments 
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Scale  m 

n"‘({/,c}) 

n-({/}) 

n"‘({c}) 

7 

0.0049 

0.0047 

0.0002 

0.0002 

6 

0.0618 

0.0600 

0.0016 

0.0020 

5 

1.2653 

1.1785 

0.0857 

0.0919 

4 

19.6851 

18.4934 

1.8335 

1.5399 

3 

64.4081 

60.5813 

18.9536 

10.0784 

3 

98.5868 

96.7171 

94.4320 

58.5045 

Table  5:  Percent  relative  error  variance  reduction  for  the  inversion  with  SNRf  —  1,  SNR^  =  4 
and  j/c  sparsely  sampled.  Here  the  sparse  availability  of  serves  to  offset  the  information  content 
generated  by  its  high  SNR.  The  overall  utility  of  the  coarse  data  set  here  is  about  the  same  as 
was  the  case  in  the  densely  sampled,  low  SNR  experiment.  Based  upon  the  data  in  the  first  three 
columns,  we  do  see  some  degree  of  active  sensor  fusion  taking  place  for  the  coarsest  scaling  and 
wavelet  coefficients;  however,  the  value  of  alone  is  practically  nil  at  scales  finer  than  3. 

5  Conclusions  and  Future  Work 

In  this  paper,  we  have  presented  an  approach  to  the  solution  of  linear  inverse  problems  based 
upon  techniques  drawn  from  the  fields  of  multiscale  modeling,  wavelet  transforms,  and  statistical 
estimation.  We  begin  with  a  system  of  noisy,  linear  integral  equations  describing  the  relationship 
between  several  sets  of  observed  data,  and  the  function  to  be  estimated,  g.  This  formulation  is 
particularly  useful  in  describing  the  situation  where  there  exists  a  suite  of  measurements  each  of 
which  conveys  information  about  the  behavior  of  g  on  different  scales.  After  discretization,  wavelet 
methods  are  used  to  transform  the  problem  from  real- space  to  scale-space.  A  maximum  a  posteriori 
(MAP)  estimator  serves  as  the  inversion  algorithm  and  produces  an  estimate  not  of  g,  but  of  its 
wavelet  transform,  7.  Regularization  is  achieved  via  a  statistical  model  of  7  which  also  provides 
a  mecins  of  capturing  any  available  prior  information  regarding  the  structure  of  g.  The  structure 
of  this  model  allows  us  considerable  flexibility  in  capturing  the  statistical  structure  of  g,  including 
the  incorporation  of  scale- varying  statistics.  To  illustrate  our  methods,  we  have  used  one  of  many 
possible  statistical  models,  namely  one  that  has  the  l//-like  fractal  structure  that  is  often  posited 
as  a  meaningful  model  for  natural  phenomena.  Moreover,  this  model  leads  to  regularization  that 
is  quite  similar  in  nature  to  traditional,  smoothness-based  regularization  approaches. 

Our  approach  makes  extensive  use  of  scale- space  in  the  analysis  of  linear  inverse  problems. 
By  introducing  the  notion  of  a  relative  error  covariance  matrix  (RECM),  we  have  developed  a 
quantitative  tool  for  imderstanding  quite  precisely  the  various  ways  in  which  data  from  a  multitude 
of  sensors  contribute  to  the  final  reconstruction  of  g.  We  demonstrate  a  method  for  determining 
the  optimal  level  of  detail  to  include  in  the  estimate  of  y  as  a  function  of  spatial  location.  The 
RECM  explicitly  provides  a  means  of  capturing  the  way  in  which  this  level  is  affected  by  changes 
in  levels  of  imcertainty  in  the  different  sources  of  data  and  the  sampling  structure  defining  how  the 
data  is  distributed  in  space.  Also,  the  incremented  benefits  associated  with  the  addition  of  data 
from  another  sensor  is  readily  explored  using  the  RECM.  Finally,  we  have  shown  the  use  of  this 
quantity  in  describing  the  process  of  multisensor  data  fusion  in  a  wavelet  setting. 
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The  RECM  analysis  can  be  of  great  use  in  the  design  of  inversion  experiments.  Because  the 
relative  error  covaricince  matrix  is  not  a  function  of  the  data,  one  can  evaluate  and  therefore  alter 
the  experimental  configuration  prior  to  actually  collecting  data.  Moreover,  having  settled  on  the 
characteristics  of  the  data  sources,  the  RECM  can  be  used  to  understand  precisely  where  in  a 
parameterization  of  g  (i.e  for  which  degrees  of  freedom)  the  data  contributes  useful  and  significant 
information.  Indeed,  the  relative  error  covariance  provides  a  useful  method  for  pruning  a  multiscale 
model  of  g  in  response  to  the  information  present  in  the  data. 

The  vehicle  for  demonstrating  our  techniques  has  been  a  two-channel  deconvolution  problem 
configured  to  mirror  many  of  the  chciracteristics  associated  with  more  general  linear  inverse  prob¬ 
lems.  In  addition  to  performing  the  RECM  analysis,  our  examples  highlight  the  ability  of  a  wavelet- 
based  approach  to  handle  non-full  data  sets.  Specifically,  we  have  considered  the  case  where  one 
source  of  information  was  available  only  near  the  boundaries  of  the  interval.  Additionailly,  we  show 
how  wavelet  techniques  Eire  a  natuTEii  mecms  for  coping  with  a  sparsely  sampled  data  set. 

We  note  that  the  general  methodologies  presented  here  are  not  restricted  to  the  ID  decon¬ 
volution  problems.  Our  techniques  can  be  used  without  alteration  for  one  dimensional  problems 
involving  non-convolutionai  kernels.  Indeed,  in  [43],  we  consider  a  non-convolutioncd  inverse  con¬ 
ductivity  problem  similar  to  those  foimd  in  geophysical  exploration.  Also,  the  extension  of  our 
approach  to  multidimensional  inversions  can  be  accompHshed  quite  easily  and  should  be  of  great 
use  in  the  anEilysis  and  solution  of  2D  and  3D  problems  which  typically  exhibit  more  severe  forms 
of  cdl  the  difficulties  foimd  in  the  ID  case. 

Although  not  considered  extensively  in  this  work,  the  multisccde,  statistically  based  inversion 
algorithms  admit  highly  efficient  implementations.  As  demonstrated  by  the  convolution  kernels  in 
Section  4  and  as  discussed  by  Beylkin  et.  al  in  [4],  wavelet  transforms  of  many  operator  matrices, 
0,  contain  very  few  significant  elements  so  that  zeroing  the  remainder  lead  to  highly  efficient 
algorithms  for  applying  0  to  Eubitrary  vectors.  These  spairseness  results  imply  that  the  least- 
squares  problems  defined  by  the  wavelet- transformed  normal  equations  also  have  a  sparse  structure. 
Thus  computationEilly  efficient,  iterative  algorithms  such  as  LSQR  [48]  cEin  be  used  to  determine 
7-  In  [43],  we  utilize  the  theory  of  partial  orthogonalization  [53]  in  the  development  of  a  modified 
form  of  LSQR.  Our  algorithm  is  designed  for  the  efficient  and  stable  computation  of  7  as  well  as 
arbitrEiry  elements  in  the  error  covsiriance  and  relative  error  covariance  matrices. 

Finally,  in  this  paper,  we  have  presented  a  batch-style  inversion  routine  in  which  the  normal 
equations  are  formulated  and  solved  to  estimate  the  entire  wavelet  transform  of  all  at  once.  A 
natural  extension  of  this  “static”  MAP  estimator  is  a  scale  recursive  inversion  routine  that  generates 
g{Tn)  recursively  stEirting  at  the  coarsest  scale  and  adding  detail  only  where  such  information  is 
supported  by  the  data.  We  note  that  such  algorithms  do  in  fact  exists  for  those  problems  in  which 
one  directly  observes  g  (or  coarse  scale  versions  of  g)  in  additive  noise  [12,13,41];  however,  extension 
of  this  work  to  arbitrary  linear  inverse  problems  requires  the  development  of  a  more  general  class 
of  multiscEile  models  which  eiIIow  for  observations  in  the  form  of  linear  functionals  of  g. 
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